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Q . Abstract 

^SJ I Usually, probabilistic automata and probabilistic grammars have crisp symbols as 

$H ' inputs, which can be viewed as the formal models of computing with values. In this pa- 

J~P' per, we first introduce probabilistic automata and probabilistic grammars for computing 

^^ , with (some special) words in a probabilistic framework, where the words are interpreted 

f^ I as probabilistic distributions or possibility distributions over a set of crisp symbols. By 

probabilistic conditioning, we then establish a retraction principle from computing with 

I — ji. words to computing with values for handling crisp inputs and a generalized extension 

.^ I principle from computing with words to computing with all words for handling arbi- 

• ■ trary inputs. These principles show that computing with values and computing with 

O . all words can be respectively implemented by computing with some special words. To 

I compare the transition probabilities of two near inputs, we also examine some analytical 

^—^ ' properties of the transition probability functions of generalized extensions. Moreover, 

J^ . the retractions and the generalized extensions are shown to be equivalence-preserving. 

QQ i Finally, we clarify some relationships among the retractions, the generalized extensions, 

^D ' and the extensions studied recently by Qiu and Wang. 

^D , Keywords: Computing with words, equivalence, extension principle, probabilistic au- 

25^ I tomata, probabilistic grammars. 
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^ ■ 1 Introduction 



To capture the notion of automated reasoning involving linguistic terms rather than 
numerical quantities, Zadeh has proposed and advocated the idea of computing with 
words in a series of papers |2S1 123 ESI 123 EHl EH] • The objects of computing with words 
are words and propositions which describe perceptions in a natural language, where the 
words play the role of labels of perceptions. For the purpose of computing, the meaning 
of a proposition is expressed as a generalized constraint. Many basic types of constraints 
have already been given by Zadeh; among others, possibilistic constraint characterized 
by fuzzy sets (possibility distributions) and probabilistic constraint characterized by 
probabilistic distributions are two most familiar ones. As a methodology, computing 
with words has provided a foundation for dealing with imprecise, uncertain, and partially 
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true data which have the form of propositions expressed in a natural language; see 
13 ini El inO] for some applications. 

Based upon the generalized constraints, one can manually handle some computa- 
tions and uncertain reasoning on perceptions. However, computing, in its traditional 
sense, is centered on the manipulation of numbers and symbols, and is usually repre- 
sented by a dynamic model in which an input device is equipped. It is well known 
that various automata, such as Turing machines, deterministic and nondeterministic 
finite automata, probabilistic automata, and fuzzy automata, are the prime examples of 
classical computational systems. Note that the inputs of such models are exact rather 
than vague data, and thus they cannot serve as formal models of computing with words. 
This observation motivates Ying to interpret "computing with words" as a computa- 
tional procedure where inputs are allowed to be vague data [HI- In this sense, it is, 
however, not easy to implement the computations on perceptions, because we have not 
known what formal models are competent for these computations. Of course, this may 
be alleviated by providing more candidate models. But even if a formal model is picked 
out, how to select some words as inputs remains difficult since usually the designer only 
provides specification for finite words. On the other hand, if a word W is selected as 
an input, then a word W near to W should also be selected because the description of 
a perception in a natural language is generally not precise and we have no excuse for 
rejecting a similar label of the perception as an input. This consideration puts us in a 
dilemma since allowing similar words as inputs will lead to an infinite input alphabet. 

Most of the literature on computing with words is devoted to developing new com- 
putationally feasible algorithms for uncertain reasoning; however, to our knowledge, few 
efforts have been made to consider the formal theory of computing with words except 
the work |221 1191 I12j . In j22j . Ying proposed a formal model of computing with words 
in terms of fuzzy automata. Fuzzy automata initiated by Santos |15j are a generaliza- 
tion of nondeterministic finite automata, in which state transitions are imprecise and 
uncertain. The point of departure in [22j is a fuzzy automaton where inputs are crisp 
symbols. These symbols may be reasonably thought of as the input values that we are 
going to compute. Following |22], we identify a value with a symbol from the input 
alphabet and also a word with a probabilistic distribution or a fuzzy subset of the input 
alphabet, and use them exchangeably. By exploiting Zadeh's extension principle, the 
fuzzy automaton gives rise to another fuzzy automaton that has all fuzzy subsets of 
the set of the symbols as inputs and models formally computing with words. The key 
idea underlying Ying's formal model of computing with words is the use of words in 
place of values as input symbols of a fuzzy automaton. Motivated by this, Wang and 
Qiu extended the concept of computing with words to fuzzy Turing machines and some 
formal grammars in [19|; moreover, they investigated the formal theory of computing 
with words in the framework of probabilistic automata and probabilistic grammars ^2j , 
where the words are interpreted as probabilistic distributions. 

Essentially, the buildings of all the formal models of computing with words in 
|22| I19| I12j go as follows: beginning with a classical computational model with values 
as inputs and then deriving a formal model with all words (interpreted as probabilistic 
distributions or possibility distributions) as inputs. Consequently, the resultant formal 
model for computing with words inevitably depends on the underlying classical compu- 
tational model of computing with values. This observation suggests us seek a general 
formal model of computing with words. At this point, we introduced the notion of fuzzy 
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Figure 1: Interrelation among retractions, extensions, and generalized extensions. 



automata for computing with words (FACWs) in [2]. An FACW is a fuzzy automaton 
where the input alphabet consists of finite words (fuzzy subsets) over some crisp sym- 
bols. In order to deal with arbitrary words that may be not in the input alphabet of an 
FACW as inputs, we estabhshed the so-called retractions and generalized extensions of 
FACWs by exploiting the methodology of fuzzy control. 

As mentioned above, the probabihstic models of computing with words in J2I also 
suffer from the dependence on the underlying classical computational models. The pur- 
pose of this paper is thus to build a general probabilistic model of computing with words. 
We introduce probabilistic automata for computing with words (PACWs) and as well 
probabilistic grammars for computing with words (PGCWs) to model formally comput- 
ing with words in a probabilistic framework. Probabilistic automata and probabilistic 
grammars have been studied since the early 1960s |llj . Relevant to our line of interest 
is the work of Rabin ^Jl- In the present paper, the words that represent generalized 
constraints are interpreted as probabilistic distributions or possibility distributions over 
some crisp symbols. 

We may think that PACWs and PGCWs are specified by experts, in which only finite 
words are considered. For example, an expert may express his opinion on a repeated 
risk investment of a firm in the following proposition: If the firm is in a good situation 
and if it invests in the projects A and B with probabilities 0.7 and 0.3, respectively, 
then it will be still in the good situation with probability 0.9 while in a bad situation 
with probability 0.1. Based upon some analogous propositions, we can build a PACW or 
PGCW to represent the expert's opinions. In practice, in many areas expert's opinions 
may be naturally expressed in terms of linguistic uncertainties. Clearly, it is desirable if 
from such knowledge, we can make inferences about some particular actions that are not 
specified by the experts. For instance, one may want to assess the situations of the firm 
when it invests in the projects A and B with probabilities 0.75 and 0.25, respectively, or 
it invests in the project A with probability 1. This motivates us to consider the so-called 
retractions and generalized extensions of PACWs and PGCWs. 

Roughly speaking, the retraction of a PACW is a probabilistic automaton, called 
probabilistic automaton for computing with values (PACV), that has crisp symbols as 
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inputs; while the generahzed extension of a PACW is another probabihstic automaton, 
called probabilistic automaton for computing with all words (PACAW), that can accept 
any words as inputs (see Figure ^. As we will see, the extension from PACVs to 
PACAWs developed in ^j is a special case of generalized extensions. By probabilistic 
conditioning rather than the methodology of fuzzy control used in [2], we establish a 
retraction principle from computing with words to computing with values for dealing 
with crisp inputs and a generalized extension principle from computing with words to 
computing with all words for dealing with arbitrary inputs. These principles show that 
in the probabilistic framework, computing with values and computing with all words can 
be respectively implemented by computing with some special words. From a modeling 
viewpoint, the generalized extensions enable infinitely possible inputs to be represented 
by finite inputs by means of interpolation. Analogously, we investigate the retractions 
and the generalized extensions of PGCWs. Furthermore, we show that the retractions 
and the generalized extensions preserve all the three kinds of equivalences among PACWs 
and PGCWs consisting of the equivalence between PACWs, the equivalence between 
PGCWs, and the equivalence between PACWs and PGCWs. 

The present work is developed closely along the lines of J2I, because in our opinion, 
like the studies on probabilistic automata and fuzzy automata in history, the proba- 
bilistic models of computing with words deserve a study similar to that of fuzzy model 
of computing with words. It is worth noting that although probabilistic automata and 
fuzzy automata are formally similar, they have different semantics and can satisfy di- 
verse applications; see, for example, [HI ^1 ^J and the bibliographies therein. Based on 
the complementarity of fuzzy logic and probability theory (see |18( 124)). probabilistic 
models and fuzzy models for computing with words may complement each other. In the 
paper, we pay more attention to some aspects that are not or may not be considered for 
fuzzy models in |2], such as formal grammars, the linearity of the transition probability 
function of a generalized extension, and analytical properties comparing the transition 
probabilities of two near inputs. 

The rest of the paper is organized as follows. In Section 2, after presenting some 
preliminaries in probabilistic automata and probabilistic grammars, we introduce two 
probabilistic models of computing with words, PACWs and PGCWs. The retractions of 
PACWs are established in Section 3. We develop the generalized extensions of PACWs 
and discuss some related analytical properties in Section 4. Section 5 is concerned with 
the retractions and the generalized extensions of PGCWs, and Section 6 is devoted to 
the equivalence preservation under the retractions and the generalized extensions. Some 
relationships among the retractions, the generalized extensions, and the extensions in 
|12j are explored in Section 7. We conclude the paper and identify some future research 
directions in Section 8. 



2 Probabilistic models of computing with words 

After recalling some basics of probabilistic automata and the extensions in J12j , we 
introduce the notion of probabilistic automata for computing with words in Section 2.1. 
In a parallel way, we define probabilistic grammars for computing with words in Section 
2.2. For later need, the equivalence between probabilistic automata and probabilistic 
grammars is briefly reviewed in Section 2.3. 
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2.1 Probabilistic automata for computing with words 

To introduce a formal probabilistic model of computing with words, let us first review 
some notions on probabilistic automata. 

We begin with some notations. Let fi be a finite set. A function fj, from Q to the 
closed unit interval [0, 1] is called a probability distribution on Q if X^^-^q fJ-ix) = 1. The 
set {x G ^ : fi{x) > 0} is called the support of fi and is denoted by supp(^). For any 
X £ Q, we use x to denote the unique probability distribution with x{x) = 1, also known 
as the Dirac distribution for x. By 2?(0) we denote the set of all probability distributions 
on the set Cl. If /i is a probability distribution with support {xi, . . . , Xn}, we sometimes 
write fj, in Zadeh's notation J2H] as 

fl = fl{xi)\xi + ^(X2)\X2 H h n{Xn)\Xn- 

With this notation, x = l\x. For any A G [0,1] and fj, £ ^(^), we define a scalar 
multiplication X- fj, : Q — > [0, 1] by (A • n){x) = A • fj,{x) for all x G Q, where the dot "•" 
in A • fi{x) stands for the product of A and /i(x). We abuse the notation "•" from now 
on, since the context will avoid ambiguity. Clearly, the scalar multiplication A • /i is not 
necessarily a probability distribution on 0. 

For later need, let us briefly review the notion of fuzzy subsets. Each fuzzy subset 
(or simply fuzzy set), A, is defined in terms of a relevant universal set X by a function 
assigning to each element x of X a value A{x) in [0, 1]. A fuzzy subset of X can be used 
to formally represent a possibility distribution on X. We denote by J''{X) the set of all 
fuzzy subsets of X. 

Recall that a deterministic finite automaton is a five-tuple A = {Q,Ti,5,qo,F), where 
Q is a finite set of states, S is a finite input alphabet, qq G Q is the initial state, F <^ Q is 
the set of final states, and 6 is a mapping from Q x S to Q. The language accepted by A 
is defined as L{A) = {s G S* : 5{qQ,s) S F}. As a generalization of deterministic finite 
automata, Rabin introduced the following probabilistic automata in the early 1960s jl.Sj . 

Definition 2.1. A probabilistic automaton is a five-tuple M = (Q,T,,6,qo,F), where: 
(a) Q is a finite set of states; 
(6) S is a finite input alphabet; 

(c) qo £ Q is the initial state; 

(d) F Q Q is the set of final states; 

(e) (5, the transition probability function, is a function from Q xT, to T){Q) that takes a 
state in Q and an input symbol in S as arguments and returns a probability distribution 
on Q. 

When the probabilistic automaton is in state p £ Q and if the input is o" E E, then it 
can go into any one of the states q G Q, and the probability of going into q is d{p, o'){q). 
Thus, the probabilistic automaton has a define transition probability for entering state q 
from state p when receiving a string (i.e., a sequence of inputs). To give this probability, 
we define inductively an extended transition probability function from Q x S* to T>{Q), 
denoted by the same notation 5, as follows: 

6{p,sa) = ^6{p,s){q) ■5{q,a) 
g&Q 
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for all s G S* and a £ T,, where S* consists of all finite strings (including the empty 
string e) over S, l\p is the Dirac distribution for p, and S{p,s){q) ■ 6{q,a) is the scalar 
multiplication of the scalar S{p,s){q) and the probability distribution 6{q,cr). It is not 
hard to check that 5{p, sa) is a probability distribution on Q. 

The language accepted by the probabilistic automaton M is defined as a function 
L{M) : S* — > [0, 1] in the following way: For any s £ S*, 

L{M){s) = Y,SiQ0,s){q). 

qeF 

The value L{M){s) is exactly the probability for M, when started in qo, to go into a 
state in F by the input s, called the accepting probability of s by M. 

The symbols from an input alphabet are usually viewed as exact input values. In this 
sense, the above definition provides a model of computing with values through proba- 
bilistic automata. Therefore, we shall refer to the probabilistic automaton in Definition 
12. II as a probabilistic automaton for computing with values (or PACV for short). In con- 
trast, words in the natural languages are the descriptions of some imprecise values; they 
can be formally represented as probability distributions or possibility distributions over 
a certain underlying set. Following Zadeh's opinion in |^, we interpret a word over an 
input alphabet S as a probability distribution (resp. a possibility distribution) on S. In 
this sense, computing with words in this paper is concerned with formal computation 
whose input is a string of probability distributions (resp. possibility distributions) on 
an underlying input alphabet, instead of a string of symbols from the underlying input 
alphabet. 

In the literature of formal language theory, a string is often called a "word". To 
avoid confusion in the present paper, we do not use the term "word" in this way and 
only use it to refer to what we mean by "word" in the phrase "computing with words." 
For clarity, we develop our work along the probability interpretation of words and point 
out several necessary modifications required to deal with the possibility interpretation of 
words. Thus, in what follows the term "word" means a probability distribution, unless 
otherwise specified. 

Motivated by Ying's formal model of fuzzy automata for computing with words 
|22j . Qiu and Wang ^2j proposed a probabilistic model of computing with words via 
probabilistic automata. This was done by extending further the transition probability 
function of a PACV as follows. 

Definition 2.2. Let M = {Q,T.,6,qo,F) be a PACV. 

(a) To handle words as inputs, 6 is extended to a function from Q x 'D{Y!,) to V{Q), 
denoted (5, as follows: 

5{P,W) = Y,W{a)-5{p,a) 

for any p £ Q and W £ ^(S). It is easy to verify that 5{p, W) £ 'D{Q), and we thus 
get a probabilistic automaton M = {Q,'D{'E),6,qo,F) with an infinite input alphabet. 
With the interpretation of words in terms of probability distributions, M can serve as 
a probabilistic, formal model of computing with all words over S. For convenience, we 
sometimes refer to M as the extension of M, which corresponds to the process (c) in 
Figure ^ 
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(b) To handle strings of words as inputs, 5 in (a) is further extended to a function from 
Q X D(S)* to V{Q), denoted again by 5, as fohows: 

hPi e) = 1\P 
5{p,SW) = Y.^6{p,S){q)-5{q,W) 
qeQ 

for ah S G P(S)* and W G P(S). 

(c) The word language Lw{M) accepted by M is a function from 2?(S)* to [0, 1] defined 
by 

qeF 

for all 5 G D(S)*. The numerical value Lw{M){S) is the probability that M accepts 
the string of words S. 

As we see from Definition 12.21 the probabilistic model of computing with words is 
essentially a probabilistic automaton which is the same as the probabilistic automaton 
in Definition l2.11 Importantly, however, the strings of inputs are different: In Definition 
12.11 they are strings of values, whereas in Definition 12.21 they are strings of words. It 
is worth noting that the input alphabet of M consists of all probability distributions 
on S and the transition probability function 5 in (a) of Definition 12.21 depends on the 
underlying probabilistic automaton M as well. For this reason, we introduce a somewhat 
general probabilistic model of computing with words. 

Definition 2.3. A probabilistic automaton for computing with words (or PACW for 
short) is a probabilistic automaton M^ = (Q, S^,, 5, go)-^)) where the components 
Q,qo,F have their same interpretation as in Definition 12. II and the following hold: 

(6') Tiw is a finite subset of T>{T,), where S is a finite set of symbols, called the underlying 
input alphabet. 

(e') 5 is a transition probability function from Q x S^, to T>{Q). 

The new features of the model in Definition 12.31 are that the input alphabet consists 
of some (not necessarily all) probability distributions over a finite set of symbols (i.e., 
the underlying input alphabet) and the transition probability function can be specified 
arbitrarily. In particular, when S^ = 1^(5]), we say that the PACW is a probabilistic 
automaton for computing with all words (or PACAW for short). The choice of S^ and 
the specification of the transition probability function 5 are provided by expert from 
experiment or intuition. The definition of language accepted by a PACV is applicable 
to PACWs, and we thus get a direct way of computing the string of words. 

The following is a simple example coming from game theory. The reader who is not 
familiar with basic notions of game theory is referred to the standard textbook |5]. 

Example 2.4. Let us see the famous prisoner's dilemma game. It goes as follows: Two 
suspects have been accused of collaborating in a crime. They are in separate jail cells and 
cannot communicate with each other. Each has been asked to confess. If both suspects 
confess, each will receive a prison term of 3 years. If neither confesses, both will be 
released on account of insufficient evidence. On the other hand, if one suspect confesses 
and the other does not, the one who confesses will receive a term of only 1 year, while 
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W^i 10.75, 1^210.05 W^i|0.05,1^2|0.85 

M^i|0.05,W2|0.1 




Wi 10.55, W2IO. 55 
Figure 2: A probabilistic automaton for computing with words. 



the other wih go to prison for 5 years. The payoff matrix in Table 1 summarizes the 
possible outcomes, where, for example, the entry in the upper left-hand corner means a 
three-year sentence for each suspect. 



Confess 



Do not confess 



-3,-3 


-1,-5 


-5,-1 


0,0 



Confess 
Do not confess 



Table 1: Payoff matrix for prisoner's dilemma. 

As the table shows, every suspect faces a dilemma. If they could both agree not to 
confess, then each would be released. But they cannot talk to each other, and even if 
they could, they may not trust each other. If one of them does not confess, he risks being 
taken advantage of by his former accomplice. In fact, the prisoner's dilemma game is a 
model of many situations in real life. For example, in oligopolistic markets, firms often 
find themselves in a prisoner's dilemma game when making output or pricing decisions. 
We suppose that the two suspects may be accused many times (i.e., they are playing 
repeated games); for simplicity, we also assume that they prefer to play with tit-for-tat 
strategy: each suspect starts by "Do not confess," and thereafter prefers to choose in 
round j + 1 the action chosen by his accomplice in round j. 

We want to give a PACW to describe a suspect's dilemma. By assumption, the 
suspect being described can merely consider three mental states, say qo, qi, and q2- For 
instance, qo might say "neither will confess" , qi might say "only one will confess" , and §2 
might say "both will confess." We use a and b to denote the strategies "Do not confess" 
and "Confess", respectively. Because the suspect is dilemmatic, it is difficult for him 
to make a specific choice among a and b. We thus suppose that the suspect makes a 
random choice among the two possible strategies, based on a set of chosen probabilities. 
In other words, the suspect is supposed to adopt mixed strategies. (Strategies of this 
kind arise naturally in repeated games.) For instance, we consider two mixed strategies 
Wi = 0.9\a + 0.1\6 and W2 = 0.1\o + 0.9\5. The strategy Wi means that the suspect 
choices a with probability 0.9, while W2 means that the suspect choices b with probability 
0.9. The transition probabilities in Figure [21 describe the suspect's belief change with his 
strategies. For example, the directed cycle with label Wi|0.75 means that the suspect 
believes his state is go with probability 0.75 if he choices the mixed strategy Wi at the 
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initial state qo, and actually, the game may not start. 

Let S = {a,b}. Then Wi and W2 are two words over S. Take Q = {^0)'?i)'?2}) 
^w = {Wi, W2}, and F = {92}- The transition probability function S : Qx S^ — > V{Q) 
follows from Figure El We then get a PACW (Q, S^, (^, qo, F), denoted by A^. The word 
language accepted by A4 is given by Lw{M){S) = S{qo,S){q2) for all S G S^. For 
example, 

Lu,{M){WiWi) = 0.05, L^{M){WiW2) = 0.1975, 
L^iM)(W2Wi) = 0.05, L^iM)iW2W2) = 0.43, 

where Lw{Ai){W2W2) = 0.43 means that the suspect believes his state is being q2 with 
probability 0.43 when he chose the mixed strategy W2 in the first accusation and choices 
the same one in the second accusation. 



We end this subsection by extending Definition 12.31 to the case of possibility distri- 
butions. 

Remark 2.5. If the words in Definition 12.31 are interpreted as possibility distributions 
over S (i.e., fuzzy subsets of S), then after replacing 2?(S) with J-{^), the definitions of 
PACWs and PACAWs are still appropriate. In terms of mathematical expressions, one 
can regard every probability distribution as a special possibility distribution, but their 
semantics are clearly different. 

2.2 Probabilistic grammars for computing with words 

Before introducing probabilistic grammars for computing with words, let us recall 
several definitions (see, for example, jS]). 

Definition 2.6. A grammar is a tuple G = {V, S, P, S) where V and T, are respectively 
finite sets of variables and terminals with y n S = 0, P is the set of productions of the 
form a — > /3, and S G V^ is the starting variable. 

The following are some frequently used notations on grammar G: 

1) Ti* is the set of all finite-length strings of S (including the empty string e), and 
S+ = S*\{e}. 

2) 7/ — > 7 means that there exist iOi,uj2 & (V U S)* and a — > /? G P such that 
rj = u)\a[jj2 and 7 = LO\liuj2- 

3) f? — > 7 denotes that there is a sequence of strings Ci) • • • j'^n such that ^1 = r\, 
^n = 7, and ^j — > ^j+i for all 1 < i < n — 1. 

4) The language generated by the grammar G is defined as L{G) = {s G S* : 5 — > s}. 

The name G below the arrows will be omitted if the grammar G that is being used 
is obvious. The form of productions determines the type of a grammar. It is well 
known that regular grammars are equivalent to deterministic finite automata, context- 
free grammars are equivalent to pushdown automata, and context-sensitive grammars 
are equivalent to Turing machines. 
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There are a large number of probabilistic versions arising from these grammars (see 
1^ O I14L I2UJ . and others). For our purpose of illustrating the idea of computing with 
words via grammars, we focus on the following probabilistic grammar, which is closely 
relevant to probabilistic automata being considered in the paper. 

Definition 2.7. A probabilistic grammar is a grammar G = (y, S,P, 5), where each 
production is endowed with a probability subject to the following: 

• for any A and a G S, '^Pr{A — > aB) = 1, where the sum runs over {B S V : 

B 
A^aB £ P}; 

• Pr{A^e) G{0,1}; 

• Pr{A -^ B) = 0. 

In the light of the first condition above, we see that Pr{A — > aB) = Pr{B\A,a), 
so we sometimes adopt the latter notation. Unlike usual grammars, a probabilistic 
grammar accepts every string with a probability. 

Definition 2.8. Let G = {V,T,,P,S) be a probabilistic grammar. The language L(G) 
generated by G is a function from S* to [0, 1] defined by 

L{G){e) = PriS^e); 



HG){s) = Yi n^^^^^^i^^-i'"*) 



Ai,...,AiGV 1=1 
Ai^e£P 

for any string s = oi • • • a/ G S+, in which Aq = S. 

Similar to the definition of PACWs, we have the following notion. 

Definition 2.9. A probabilistic grammar for computing with words (PGCW) is a prob- 
abilistic grammar Gw = iV, S^, P, S), where all components have their same interpreta- 
tion as in Definition 12. 7| except that S^ is now a finite set of probabilistic distributions 
over some underlying terminal set S. 

The language generated by a PGCW G^, called word language and denoted L^(G^), 
is defined in the same way as in Definition 12.81 

The following is an example of PGCW arising from Example 12.41 

Example 2.10. Let S = {a, 6}, Wi = 0.9\a + 0.1\6, and W2 = 0.1\a + 0.9\6. Set 
y = {90,91,^2}, ^w = {Wi,W2}, S = {go}, and 

P = {qi -^ WkQj :i,je {0, 1, 2}, k e {1,2}} U {qi ^ e : i = 0, 1, 2} 
with the probabilities 

Pr{qi\qo, Wi) = 0.2, Prfeko, Wi) = 0.05, 

Pr{qi\qi,Wi) = 0.55, Prfeki, VFi) = 0.05, 

Priqi\q2, Wi) = 0.85, Priq2\q2, Wi) = 0.05, 

Pr{qi\qo, W2) = 0.85, Pr{q2\qo, W2) = 0.1, 

Pr{qi\qi,W2) = 0.55, Prfeki, ^2) = 0.4, 

Pr{qi\q2, W2) = 0.1, Pr{q2\q2. W2) = 0.85, 

Pr{qi ^ e) = 0, Pr{q2 ^ e) = 1. 



Pr{qo\qo,Wi) = 


: 0.75, 


Pr{qo\qi,Wi) = 


:0.4, 


Pr{qQ\q2,Wi) = 


:0.1, 


Pr{qo\qo,W2) = 


0.05, 


Pr{qo\qi,W2) = 


0.05, 


Pr{qo\q2,W2) = 


0.05, 


Pr{qo ^ e) = 0, 


; 
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We then get a PGCW {V, Sto, -P, S), denoted by Q. It is easy to verify that L^(^)(s) = 
Lw{M){s) for all s eT.^, where M is the PACW in Example El 

2.3 Probabilistic automata vs. probabilistic grammars 

It is not hard to check that probabilistic automata in Definition 12.11 fi.e.. PACVs) 
and probabilistic grammars in Definition 12 . 71 are equivalent. For later need, we record a 
construction of the equivalence. 

Given a probabilistic grammar G = {V,T,,P,S), the following process generates a 
probabilistic automaton Mq = {Q,T,,6,qQ, F) satisfying that L{G){s) = L{Mq){s) for 
all s E S*: 

1) Let Q = V, qo = S, and F = {A £ V : Pr{A -^ e) = 1}. 

2) Define 6{A,a){B) = Pr{B\A,a) for ah A,B e Q and a e J:. 

In turn, given a probabilistic automaton Af = {Q,Ti,6,qo,F), we can also construct 
an equivalent probabilistic grammar Gm = {V,'^, P, S): 

1) Let V = Q and S = go- 

2) Let P = {A ^ aB : A, B e V,a e T.} U {A ^ e : A e V} and define the 
probabilities of the productions as follows: 

Pr{A^aB) = 6{A,a){B)- 

1, [f AG F 



Pr(A ., , ^ 

' 0, otherwise. 

For convenience, we say that Mq is the probabilistic automaton induced from the 
probabilistic grammar G and also Gm is the probabilistic grammar induced from the 
probabilistic automaton M. Clearly, the construction above is applicable to PACWs 
and PGCWs, which gives the equivalence between them. 

3 Retractions of PACWs: Towards computing with values 

Recall that the probabilistic model of computing with words derived by Qiu and 
Wang is in fact an extension from computing with values to computing with all words. 
In this section, we in turn address how to tackle computing with values when we only 
have a probabilistic automaton M^ = {Q,'^w,S,Qo,F) for computing with words. To 
this end, we shall establish a probabilistic automaton M^ = {Q, S, 6^, qo, F), where the 
components Q,qo,F are the same as those of M^,, E is the underlying input alphabet 
of M^, and 6^ which depends on the transition probability function of M^ need to be 
defined. 

For any p,q G Q and cj G S, we want to derive a formula for computing S^{p,a){q) 
by conditional probability. Let C, A'^, and I denote the random variables of current 
state, the next state, and real input (crisp input), respectively. Let O represent what 
we observe about the current input. Given a PACW M^ = (Q, S^, (5, goj-^)) we can 
extract the following information about conditional probability: Pr(N = q\0 = W,C = 
p) = ^{PjW)iQ) ^-ncl Pr{I = a\0 = W) = W{a). Further, we make several natural 
assumptions: 
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(a) The prior probabilities Pr(W) are equal for all W G S^. 

(b) Given the current state and the observation, the next state is independent with 
the real input. 

(c) Given the observation, the real input is independent with the current state. 
With these assumptions, we have the following calculation. 

S^{p,a){q) = Pr{N = q\I = a,C =p) 
Pr{N = q,I = a\C = p) 
Pr{I = a\C = p) 
J2 Pr{N = q,I = (7\0 = W,C = p)Pr{0 = W\C = p) 



^ Pr{I = a\0 = U,C = p)Pr{0 = U\C = p) 

^ Pr{N = q,I = (j\0 = W,C = p) 

^ V V (T \7\ 777^ ^ C^^ assumption (a)) 

2^ Pr[l = a\0 = U,C = p) 

^ Pr{N = q\0 = W,C = p)Pr{I = a\0 = W,C =p) 

J2 Pr{I = a\0 = U,C = p) 

(by assumption (b)) 
J2 Pr{N = q\0 = W,C = p)Pr{I = a\0 = W) 
^^^^^ ^ p^(j-^^|Q^^) (by assumption (c)) 

J2 Pr{N = q\0 = W,C = p)W{a) 
E U{a) 



Y^ ^^^^ Pr{N = q\0 = W,C=p) 



w-es» c/es 



W{a) 



E iri^^>' -)<'>> 



WeE» c/es 



namely. 






Since the definition of 5^ follows from a conditional probability, it is clear that for any 
p (z Q and cj G S, 5^{p,a) is a probability distribution on Q. Based upon the transition 
probability function 5-'-, we can get a PACV from M^ as follows. 

Definition 3.1. Let M^ = {Q,Tiyj,5,qQ,F) be a PACW. The retraction of M^ is a 
PACV Mi, = (Q, S, 5Kqo, F), where: 

(a) The components Q, gO) ^ are the same as those of M^. 
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(6) S is the underlying input alphabet of M^j. 

(c) 6\ the transition probabihty function, is a mapping from Q x S to V^Q) that maps 
(p, a) G Q X S to a probability distribution 5^{p,a) on Q defined by 

5^{v,'y){q)= Y. -^^^.5{p,W){q) (1) 

for any q G Q. 

Notice that for a given M^, the coefficient Vl^(cr)/ X]c7es^('^) ^^ ^^^ above equation 
(1) is only dependent on W and a, so for convenience, we will always write XaiW) for 
^('^)/Z]{/gs^('^) ™ ^^^ '^^^^ of this paper. With this notation, (1) in Definition 13 .11 is 
as follows: 

S^{p,cT){q)= Y. xAW)-5{p,W){q). {!') 

The retraction of M^ deals with exact inputs, and thus it may serve as a device for 
computing with values. We will refer to "|" as the operation of obtaining the retraction. 
As an example, we now derive the retraction of the PACW Ai given in Example 12.41 

Example 3.2. Consider the PACW Ai in Example 12.41 By definition, we see that 

Further, using (1') yields that 

S^iqo,a) = XaiWi) ■ <5((Zo, VFi) + Xa(T^2) • %o, ^^2) = 0.68\(7o + 0.265\(7i + 0.055\g2, 

SHqo, b) = XbiWi) ■ <5(go, Wi) + Xb{W2) ■ S{qo, W2) = 0.12\go + 0.785\(7i + 0.095\g2, 

S^qua) = XaiWi) ■ 6{qi, Wi) + Xa{W2) ■ S{qi, W2) = 0.365\go + 0.55\(7i + 0M5\q2, 

SHQi,b) = XbiWi) ■ d{qi,Wi) + Xb{W2) ■ S{qi, W2) = 0.085\(7o + 0.55\gi + 0.365\(72, 

^^92, a) = XaiWi) ■ 6{q2, Wi) + Xa(H^2) • S{q2, W2) = 0.095\go + 0.775\(7i + 0.13\q2, 

SHq2, b) = XbiWi) ■ 5{q2, Wi) + Xb{W2) ■ S{q2, W2) = 0.055\(7o + 0.175\(7i + 0.77\(72. 

This transition probability function 6^, together with some data of Ai, gives rise to 
M-^ = {{qo,qi,q2},{o-,b},6^,qo,{q2})- The language accepted by M-^ is defined by 
L{M^){s) = S^{qo,s){q2) for all s G {a,b}*. For example, L{M^){ab) = 6{qo,ab){q2) = 
0.203675. 

We end this subsection by making a close link between computing with values and 
computing with words. 

Theorem 3.3. Suppose that M^ = {Q,T.u,,S,qo,F) is a PACW and mI = 
{Q,Ti,6^,qo,F) is the retraction of Mw Then for any s = ui • • • o"/ G S*, we have 
that 

I 
L{Mi){s) = Y. LUM^){Wi ■■■Wi)-l{ XaAW^), 

Wi,...,WieT,^ i=l 

where x^A^i) = Wi{ai)/J2ueJ^J^i(^i)- 
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The above theorem may be seen as a retraction principle from computing with words 
to computing with values. The meaning of this theorem is that computing with values 
can be implemented by computing with words. The advantage of this approach is that 
we can directly obtain the accepting probability of a string of values from the accepting 
probabilities of some strings of words, that is, we need not compute the transition 
probability function 6^; the price of doing so is that the number of computations for 
implementing computing with values by computing with words increases exponentially 
as the length of the input string. To illustrate this, let us revisit Example l3.2l to compute 
L{Ai^){ab) by using the result of Theorem 13. 31 By the equality in Theorem 13.31 and the 
calculated results in Example 12. 4| we obtain that 

L{M^){ab) = LUM){WiWi)xa{Wi)xb{Wi) + LUM){WiW2)Xa{Wi)xb{W2) 

+L^{M){W2Wi)Xa{W2)Xb{Wi) + L^{M){W2W2)Xa{W2)Xb{W2) 

= 0.05 X 0.9 X 0.1 + 0.1975 x 0.9 x 0.9 + 0.05 x 0.1 x 0.1 

+ 0.43 xO.l xO.9 
= 0.203675. 

This coincides with the result obtaining from the transition probability function of A^^. 
We are now ready to prove Theorem 13.31 To this end, it is convenient to have the 
following lemma. 

Lemma 3.4. Let My, = {Q,T.y,,5,qo,F) be a PACW and mI = {Q,^,6^ ,qo,F) be the 
retraction of M^. Then for any p,q £ Q and s = ai ■ ■ ■ ai £ T,* , we have that 



6^{p,s)iq)= Yl ^iP,Wl■■■Wl){q)■l[xaAW^). 

Wi,...,VK,GS» 



i=l 



Proof. We prove it by induction on I. 

1) For the basis step, namely, / = 0, it is trivial. 

2) The induction hypothesis is that the above equation holds for s = ai • • • o";. We 
now prove the same for sai^i. Using the definition of 6^ and the induction hypothesis, 
we have the following. 

6^{p,sai+i){q) = 6^{p,ai---aiai+i){q) 

I 
= E[ E HP,W,■■■Wl){q')■l[x.AW^)]■6^{q',al^l){q) 

q'&Q m,.-,VK,GS» 1=1 

I 
= Y. Y. ^iP^W,---Wi){q')-5^{q\ai+,){q)-\{x<r.{W^) 
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= E E S{p,W,---Wi){q')-[ Y. X.,+iW+l) 

q'£QWi,...,W,&T.^ W,+i£T.^ 

I 
MQ',Wl+,){q)\■Ylx.AW^) 

i=l 

= E E E 6(j>,Wi---Wi){q') -dig', Wi+i){q) 

l+l 

1=1 
l+l 
= E E ^iP^Wi---Wi)iq')-5iq',Wi+i){q)-llx.AWi} 

q'eQWi,...,Wi+i£j:^ i=\ 

i+\ 
= T. H '^(^'^1 • --WlW) ■ 5iq',Wi+i){q) ■llxaAW^) 

Wi,...,Wi+ieT,y,q'£Q i=\ 

l+l 

E [ E '^(^'^i • ■■WlW) ■ 6iq',Wl+,){q)\■llx.AW^) 

Wi,...,VK,+iGi;» g'GQ i=l 

l+l 

E SiP^W^■■■Wl+^){q)■llxaM^), 

Wi,...,Wi+ieT,^ i=\ 

namely, b^{j),sai+x){q) = Em,...,Wi+ies„ '^(P' ^i • • • ^/+i)(9) • Olil X<7,(Wi)> which 
proves the lemma. D 

We are now in the position to verify Theorem 13.31 

Proof of Theorem 13.31 By the definition of L^iMw) and Lemma IT^ we have the 
equation below. 

L{Ml){s) = E'^^('?o,^i---'7/)('7) 

qeF 

I 

= E E S{qo,Wi---Wi){q)-llx.AWi) 

q(^FWi,...,Wi(^^w i=l 

I 

E Y.^^'iO,Wi---Wi){q)-llxaAWi) 

Wi,...,WieT,^, qeF i=i 

I 

E [E'^(90'^i---^')('?)]-n^-«(^^) 

Wi,...,VKiGS» q&F i=l 

I 
J2 LUM^){Wl■■■Wl)■llxaAW^), 

Wl,...,WieT,yj J=l 

which completes the proof of the theorem. D 

Remark 3.5. One can readily verify that Definition 13. II and Theorem 13. 31 remain valid 
without any modifications when the words are interpreted as possibility distributions. 
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4 Generalized extensions of PACWs: Towards computing 
with all words 

Having finished the transformation from computing with words to computing with 
values in the preceding section, we turn our attention to another transformation which 
makes a PACW more robust in the sense that it can deal with more inputs. More 
explicitly, suppose that there is a PACW M^ = {Q,T,w,S,qo,F) for computing with 
words. Note that the input alphabet S^ comprises only finite words over S. To allow 
more words as inputs, we will extend 5 to a transition probability function 6^ : Q x 
P(E) — > V{Q). As a result, we will obtain a PACAW mJ, = {Q, V{^),6\qo, F) which 
can accept more words than in E^ as inputs. 

We derive the PACAW M^ and discuss the computation of L^(M^) in the first sub- 
section. In Section 4.2, we examine some analytical properties of generalized extensions 
that compare the transition probabilities of two near inputs. 

4.1 Generalized extensions of PACWs 

Let M^ = (Q, T,yj,6,qo,F) be a PACW. Recall that in the definition of retractions, 
the key ingredient is the induced transition probability function 6K Now, we would like 
to use the definition of 6^ to derive the transition probability function 5^^ for dealing 
with inputs of arbitrary words. 

Let us begin with some special words, Dirac distributions. Clearly, the transition 
probabilities of a Dirac distribution a G ^(S) and the corresponding a £ T, should be 
the same when considering them as inputs of M^. In light of this, it is reasonable to 
define that for any p £ Q and any Dirac distribution a G ^(S), 



S' {p, a) = 6^{p, 



cr 



We now consider the case of any word as inputs. For any W' £ ^(S), we have that 
W' = X^o-gs W^'(<7) ■o'. We thus see that the Dirac distributions a's, cr G S, play a role of 
basis. So we then extend by linearity the previous definition of 6^ for Dirac distributions 
to W' as follows: 

5Hp,W') = Y,W'{a)-6Hp,a). (2) 

Since S^{p, a) = Y2wei^ Xa-{W) • 6{p, W) by the equation (!'), we obtain that 



6\p,W') = Y.W'{a)-5\p,a) 

= E^'(^)-[ E xAw)-s{p,w)] 

= E E W'{a)-Xa{W)-6{p,W) 

= E E^'(^)-^-(^)-'^(^'^) 

PFes„o-es 

= E [Y.^'{'^)-xAw)]-s{p,w). 
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Since Xa{W) in the above equation depends merely on W and a, it follows that the 
sum Yla&T, ^'(^) ■ XaiW) is only dependent on W and W, and hence, we will always 
write 9w{W) for J2aeY; W{(^) " Xct{W) for the sake of convenience. As a result, we get 
that 

6\p,W')= Y, 6w'{W)-6{p,W). 

Based on the definition of 6^ , we have the following. 

Definition 4.1. Let M^ = (Q, S^, 5, goi -?") be a PACW. The generalized extension of 
M^ is a PACAW mI = {Q,V{^),6\qo,F), where: 

(a) The components Q, qo, F are the same as those of M^. 

(6) ^(S) consists of all probability distributions over the underlying input alphabet of 

(c) 6\ the transition probability function, is a mapping from Q x 'D(S) to ^^{Q) defined 
by 

SHp,W')= Y, ew'{W)-6{p,W) (3) 

for any (p, VF') G Q x P(S), where %,(VF) = E.eE W'{a) ■ [W{a)/ Zue^Ui'^)] ■ 

For any {p,W') G Q x I'(S), it follows from the definition of 6^ that d^p,^^') 
is indeed a probability distribution on Q, so Definition 14.11 is valid. As we see from 

Definition 14.11 the generalized extension M^ of M^ can deal with all words over the 

•I- 

underlying input alphabet of M^j as inputs. We thus consider Mi, as a device for 
computing with all words and refer to "|" as the operation of obtaining the generalized 
extension. 

The formula given in Definition 14.11 for computing 6^{p,W') seems complicated. In 
fact, this is not a problem since one can use the equation (2) to compute it. By the 
argument that S^{p,a) = 6^{p,a) for all cr G S, we see that it is easy to compute 
S^ {p, W') once we have obtained the retraction Mi of M^. 

We now present an example to illustrate the previous definition. 

Example 4.2. Let us first derive the generalized extension of the PACW Ai produced in 
Example 12.41 Bv Definition 14. 11 we have that A^1^ = {{qo,qi,q2},'^i{a,b}),6^ ,qQ,{q2}), 
where 6^ follows from the following calculation: For any W = a\a+{l—a)\b G I?({a, b}) 
with a G [0, 1], by the equation (2) (or equivalently, the equation (3)) we have that 

SHqo,W') = W'{a)-6Hqo,a) + W'{b)-6Hqo,b) 
= W'{a)-6^{qo,a) + W'{b)-5^{qo,b) 
= (0.12 + 0.56a)\go + (0.785 - 0.52a)\gi + (0.095 - 0.04a)\g2. 

By the same token, we have the following: 

S\qi,W') = (0.085 + 0.28a)\go + 0.55\gi + (0.365 -0.28a)\g2, 

S\q2, W') = (0.055 + 0.04a)\go + (0.175 + 0.6a)\qi + (0.77 - 0.64a)\g2- 



The following remark justifies the name of generalized extensions. 
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Remark 4.3. We remark that the generahzed extension is generally not an extension 
in a strictly mathematical sense, that is, there may exist p £ Q and W £ S^ such that 
6^{p,W) 7^ 6{p,W). For instance, in Example 14.21 



6iqo, Wi) = 0.75\qo + 0.2\gi + 0.05\(/2, 



while 

S\qo, Wi) = 0.624\go + 0.317\gi + 0.059\g2; 

they are not equal. The appearance of such an inequality is not surprising if we have 
noticed that the calculation of S^ {p, W) depends on all W G S^ and 6{p, W), while the 
words in S^„ may be intersecting each other. Clearly, if the calculation of 6^ {p, W) is not 
disturbed by those W' £ I1^\{VF} and 5{p,W'), then the generalized extension must 
be an extension. For example, if each word in S^ degenerates into a Dirac distribution, 
then it is not hard to check that the generalized extension is indeed an extension. 

The equation (2) also motivates us to consider the linearity of 6^{p,W') on the 
second argument. To make this precise, we need more notions. A vector is called 
stochastic if all its entries are nonnegative and the sum of its entries equals 1. Assume 
that S = {(7i, o"2) • • • 1 o'n}- Then each word W over S can be uniquely written as an n- 
dimensional stochastic row vector [Pl^(o"i), W{a2), ■ ■ ■ , W^(cr„)]. A linear combination of 
some words Wi, W2, . . . ,Wi over S is an expression of the form kiWi + k2W2 + - • • + kiWi, 
where all /cj's lie in M, the real numbers. In other words, the linear combination kiWi + 
k2W2+- ■ ■+kiWi is the re-dimensional row vector [ci, C2, . . . , Cn] with Cj = J2i=i hWi{aj), 
j = l,...,re. 

A linear combination of words does not necessarily yield a word. However, if the 
linear combination is indeed a word, then the transition probability when inputting the 
linear combination can be computed in the following ways. 

Proposition 4.4. Suppose that the linear combination W' = X^j=i kiW^ with W^ E 
P($]) is a word over S. Then 6^ {p, W) = X]j=x ^« ' ^^{P^ ^/)- 

Proof. It follows from the equation (2) that 

5\p,W') = Y.W'{a)-5\p,a) 

o-eE i=l 

I 



Y,h-5Hp,wl), 



1=1 



which finishes the proof of the proposition. D 

The proposition above shows that 5^{p,W') is linear on the second argument. Fur- 
ther, this proposition gives rise to a simple corollary which is helpful to calculate 
6^{p, W). To state this result, we appeal to a concept from linear algebra. 
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A set of some words over S is linearly independent if none of its elements is a linear 
combination of the others. Keep the assumption that S = {o"i,cr2, . . . ,0"„}. It then 
follows from the theory of linear algebra that the number of linearly independent words 
over S is at most n. Further, if W{, . . . , W^ are n linearly independent words over S, 
then any word W over S can be uniquely expressed as a linear combination of these 
words, that is, W = kiWl + - ■ ■ + knW^ for some ki S M. Finding n linearly independent 
words over S and expressing a word in the form of linear combination of these words are 
fairly routine exercises in linear algebra; we omit details here. The following corollary is 
a generalization of the equation (2), which allows us to compute transition probabilities 
from an arbitrary set of linearly independent words. 

Corollary 4.5. To compute 5^{p,W') for any {p,W') G Q x P(S), we can follow the 
steps below: 

(1) Find any n linearly independent words over S, say W{, . . . , W^, and write W = 
kiW[ + --- + knWl,. 

(2) Compute (5^(p, W[) for all i = 1, . . . ,n. 

(3) Compute the sum ^"=;^ h ■ 6^ {p, W^), which exactly equals 6^ [p, W). 

Proof. It follows immediately from some facts on linear algebra and Proposition 

1231 D 

Analogous to Theorem I.S.31 we can also establish a close link between computing 
with some special words (i.e., those in S^) and computing with all words. 

Theorem 4.6. Suppose that M^ = {Q,Tiw,S,qQ, F) is a PACW and MJ, = 

(Q, T)[T,),6^ ,qo,F) is the generalized extension of M^. Then for any S = W[ ■ ■ ■ W^ G 
P(S)*, we have that 

I 
L^{Ml){S)= Yl LUM^){Wi---Wi)-llewi{Wi), 

Wi,...,WieT,yj i=l 

where O^iiWi) = E.es^/(^) " [^.(^)/E[/6S„f^(^)] • 

Theorem 14.61 mav be seen as a generalized extension principle from computing with 
special words to computing with all words. The meaning of this theorem is that com- 
puting with all words can be implemented by computing with special words; and thus, 
it gives us a way of dealing with arbitrary words as inputs of a PACW. It is clear that 
the number of computations for implementing computing with all words by computing 
with words increases exponentially as the length of the input string. To see this, let us 
revisit Example 14.21 

Example 4.7. Consider the generalized extension M^ = {{qQ,qi,q2},T){{a,h}),5\ 
qo-,{q2}) in Example 14.21 As an example, taking W' = 0.2\a + 0.8\6, we now use 
two approaches, the definition of 5'^ and Theorem 14.61 to compute Lyj{JvO){W'W'). 
By Example 14.21 we see that 

5\qQ,W') = 0.232\go + 0.681\5i+0.087\r?2, 
8Hqi,W') = 0.141\go+0.55\gi+0.309\g2, 
5\q2,W') = 0.063\5o+0.295\5i+0.642\g2. 
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Thereby, it follows from the definition of word languages that 

L.UM'^XW'W) = 5Hqo,W'W')iq2) 

= 5^<5T(go,VF')(g)-<5T(g,W0fe) 

= S\qo,W')iqo)-6\qo,W'){q2) + 6\qo,W'){qi)-6\qi,W'){q2) 

+ SHqo,W'){q2)-6Hq2,W'){q2) 
= 0.286467. 

We now calculate LuJ{^A'^){W'W') using Theorem 14.61 By definition, we have that 

In Example m we have obtained that L^{M){WiWi) = 0.05,L^{M){WiW2) = 
0.1975, L^{M){W2Wi) = 0.05, and L^{M){W2W2) = 0.43. It thus follows from Theo- 
rem 14.61 that 

L^{M^'){W'W') = L^{M){WiWi)-ew'{Wi)-9w'{Wi) 

+L^{M){WiW2) ■ ew'iWi) ■ ew'{W2) 

+L^{M){W2Wi) ■ ew'{W2) ■ OwiWi) 
+L^{M){W2W2) ■ ew'{W2) ■ 9w'iW2) 
= 0.286467, 

as desired. 

The proof of Theorem 14.61 proceeds along the same lines as the proof of Theorem 
VS.'dl let us first establish the following lemma. 

Lemma 4.8. Let M^ = {Q,T.^,6,qo, F) be a PACW and mI = {Q,V{T.),6\qo,F) be 
the generalized extension of M^. Then for any p,q £ Q and S = W{ ■ ■ ■ W'l G 'D(S)*, 
we have that 

I 

W^_,...,Wl(iT.u, i=l 



Proof. We prove it by induction on I. 

1) For the basis step, namely, / = 0, it is trivial. 

2) The induction hypothesis is that the above equation holds for S = W[ ■ ■ ■ W^ S 
ViTi)* . We now prove the same for SW^j^^, i.e., W[ ■ ■ ■ WlWl_^^^. Using the definition of 
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S^ and the induction hypothesis, we have the following computation. 



6Hp,SWl^,){q) = 6Hp,Wi---WlWl^,){q) 

= [Y.6Hp,Wi---Wl)iq')-S\q',Wl^,)]iq) 

= J;<5T(p,T^^..H^/)(g')•<5n9^W,Vl)(g) 
g'eQ 

I 
= Et E HP,Wi---Wi){q')-l[9w^{W,)]-5Hq',Wl^,){q) 

q'£Q Wi,...,WieT.^ i=l 

I 
= E E S{p,Wi---Wi)iq')-5Hq',WU,){q)-ll9wl{W,) 

= E E s{p,w,---wi){q')-[ Yl %,V,W+i) 

q'eQWi,...,Wi&i:„ W;+ies„ 

I 
■d{q',Wi+l){q)]-llew>{W,) 

i=l 

= E E E Sip,Wi---Wi){q')-6iq',Wi+,)iq) 

q'£Q Wi,...,Wi£T.^ W^i+iSS™ 

l+l 

i=l 
l+l 

= E E 5{p,Wi---Wi){q')-5{q',Wi+i)iq)-l[ewi{Wi) 

q'eQWi,...,Wi+i&^^u i=i 

l+l 

E E^(^'^i---^')(«')-'^(^''^'+i)(^)-n^w^/(^^) 

Wi,...,iyi+iGS„, g'eQ i=i 

l+l 

E [E^(^'^i---^')(«')-^(«''^'+i)(«)]-n^i^/(^*) 

Wi,-,W^i+iei;„, g'6Q i=l 

l+l 

Y, Sip,Wi---Wi+,)iq)-ll9w;iW^), 

Wi,...,Wi+ieJ:y, i=i 



i.e., SHp, SWU,){q) = j:Wu...m+i^J^^ ^(P^ ^1 • • • Wi+iM-U'iLl ewiiWi). This finishes 
the proof of the lemma. D 

Based on the above lemma, the proof of Theorem 14.61 is straightforward. 

Proof of Theorem 14.61 By the definition of word languages and Lemma 14.81 we 
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have the fohowing equation. 

qeF 

I 

= E E ^i<lo,W,---Wi){q)-l[ew,iWi) 

q€FWi,...,Wi&J:n, i=l 

I 

Wi,...,Wi&E^ qeF i=l 

I 
Wi,...,Wiei:^ qeF i=\ 

Wi,...,WieJ:n, i=i 

which proves the theorem. D 

After discussing the generahzed extensions of PACWs under the probabihty inter- 
pretation of words, we point out some shght modifications for the case of the possibihty 
interpretation of words. 

Remark 4.9. If the words in this subsection are interpreted as possibihty distributions, 
then to make the definition of generahzed extensions true, we have to modify Definition 
Oby substituting \\W'\\{a) for W'{a), where \\W'\\{a) stands for W'{a)/Y^^^j^W'{T). 
It is not hard to check that this substitution is prerequisite for 6^{p,W') G T){Q). 
Correspondingly, the notation 6\y/{W) denoting '^(^^^W'{a) ■ Xct{W) is replaced by 
Ow'iW) = X^o-es ll^'ll(^) ■ XaiW). In addition, since we are interpreting words as 
possibility distributions, it is natural to require that S^, C .F(S) and replace I'(S) 
by ^(S). With these substitutions, all of the results in this subsection hold, except 
for Proposition 14.41 and Corollary 14.51 which need more modifications. For example. 
Definition 14. II can be stated as follows. 

Definition mH. Let M^ = {Q,J:y,,5,qo,F) be a PACW with S^ C J^{Y.). The 
generalized extension of M^ is a PACAW M^ = {Q,J^{Ti),6\q(),F), where: 

(a) The components Q, qQ, F are the same as those of M^. 

(b) .F(S) consists of all possibility distributions (i.e., fuzzy subsets) over the underlying 
input alphabet of M^. 

(c) 6\ the transition probability function, is a mapping from Q x J^(S) to ^{Q) defined 
by 

5\p,W')= Y, 0w'{W)-6{p,W) 

VFes„ 

for any {p^W) eQx .F(S), where OwiW) = E.es ll^'ll(^) " [W^(^)/Et/eE.t^(^)]- 

In terms of possibility distributions. Proposition 14.41 and Corollarv 14.51 can be stated 
as follows. 
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Proposition 14. 4f . Suppose that the linear combination W = X^j^x kiW^ with W- £ 
JF(S) is a word in ^(S). Then 

I 



'''"■'^"'^Eji^g^---*''-"''' 



Corollary 14. 5f . To compute 5'{p,W') for any {p^W) G Q x J^(S), we can follow the 
following steps: 

(1) Find any n linearly independent words in J-{T,), say Wl, . . . , W^, and write W = 

(2) Compute 6^ {p, Wl) for all i = 1, . . . ,n. 

(3) Set k'- = ki/J2^^^W'{a), and then compute the sum Y17=iK ' ^^(P^^D which 
exactly equals 6^{p,W'). 

4.2 Analytical properties of the generalized extensions 

In this subsection, we pay attention to two analytical properties of the generalized 
extensions. Intuitively, the analytical properties show us that the transition probabilities 
of two near inputs are near as well. 

Let us keep the assumption that S = {cti, cr2, . . . ,cr„}. Recall that each 
word W € I^(S) can be identified with an n-dimensional stochastic row vector 
[VF((Ti), W{a2), . . . , VF((T„)]. Since such a stochastic row vector corresponds to a point in 
the n-diniensional Euclidean space R"' (in fact, 'D{T,) is exactly the (n — l)-dimensional 
simplex), we can discuss the distance between two words in I'(S). More explicitly, for 
any two words W' and W" in T>(T,), it follows from the definition of Euclidean metric 
that the distance d{W', W") is given by 



d{W', W") 



\ 



Y.'^K - M')^ 



i=l 



where k'- = W'{(Ti) and k'l = W"{ai). 

The result below shows that if two words are near enough, then the transition prob- 
abilities when inputting them at the same state are nearby. In this subsection, we abuse 
the notation e and 5 used in PACWs as well as in the following epsilon-delta language, 
and also abuse the notation "| |" to denote the cardinality of a set and the absolute 
value of a real number. 

Proposition 4.10. For any p,q G Q, the function 6^{p,x){q) is uniformly continuous 
on 'D{T,). In other words, for any e > 0, there exists a 6 > such that \S^ {p,W'){q) — 
5^p,W"){q)\ < e for anyp,q£ Q, whenever d{W',W") < 6 with W',W" G P(S). 

Proof. Suppose that W'{ai) = k[ and W"{ai) = k'l for i = l,...,n. Taking 
5 = e/ y/n^ we see by definition that \/Yll=i{K ~ ^i')^ ^ e/v^- Moreover, using the 
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equation (2) we have that 

n n 

\6Hp,W'){q) - 5Hp,W"){q)\ = \Y,kiSHp,^)iQ)-T.^^■SHp,^^){Q)\ 

i=l i=l 

n 

= \Y.if'i-k'^)-SHp,a,){q)\ 

i=l 

n 

1=1 

n 

i=l 

n 

i=l 

Note that the arithmetic mean of a set of values is not greater than the root-mean-square 



of these values, i.e., |(X^"=i a.i)/n\ < J {J2^=i ^f)/"- for any ai, . . . , Un G M. We thus get 
that 



El^'.-k'H < v^- 



i=l 






that is, \d^p,W'){q) - 6^p,W"){q)\ < e, finishing the proof. D 

To give an illustration, let us examine the transition probability function of L.u,{A4^) 
obtained in Example 14.21 

Example 4.11. Keep the data in Example 14.71 where we have obtained all transition 
probabilities S\p,W'){q) for W' = 0.2\a-|-0.8\6 and any p,q G {qo,Qi,Q2}', for instance, 

6^qi,W') = 0.141\go + 0.55\gi + 0.309\^2. 

Let us consider which words W" £ ^({a, b}) can be inputted at any state p £ {qo, gi, 92} 
such that \6^p,W'){q) - 6^p,W"){q)\ < 0.001 for all q € {qo,qi,q2}- To this end, we 
take e = 0.001 and apply Pronositiou BTTOl Let W" = a\a + {I - a)\b £ V{{a, b}). By 
the proof of Proposition 14.101 taking 6 = \/2/2000, we can get that \S'^ {p,W'){q) — 
S^p,W"){q)\ < 0.001, whenever d{W',W") < 6. The latter is equivalent to a G 
(0.1995,0.2005) by a routine computation. Summarily, for any W" = a\a -|- (1 — a)\b 
with a G (0.1995,0.2005), we have that \S^p,W'){q) - 6^p,W"){q)\ < 0.001 for any 

p,q^ Uo,qi,q2}- 

More concretely, take a = 0.2004 G (0.1995,0.2005) and p = qi as an example. It 
follows from the formula in Example 14 . 2 1 that 

5\qi,W") = (0.085 -h 0.28a)\go + 0.55\gi -h (0.365 - 0.28a)\g2 
= 0.141112\go + 0.55\gi + O.c 
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As a result, we see that 

\5^qi,W'){qo) - S\qi,W")iqQ)\ = |0.141 - 0.141112| = 0.000112 < 0.001, 
\SHqi,W'){qi) - 5^qi,W"){qi)\ = |0.55 - 0.55| = < 0.001, 
\5^qi,W'){q2) - S\qi,W"){q2)\ = |0.309 - 0.308888| = 0.000112 < 0.001, 

as desired. 

We further compare the accepting probabiUties of two strings of near words. As 
expected, if every corresponding components of two strings are near enough, then the 
accepting probabilities of the two strings are nearby. To state this in a mathematical 
term, we need a metric space which makes the accepting probabilities of two strings 
comparable. 

Notice that the set 2?(S) with the Euclidean metric d gives rise to a metric space. 
For any / > 1, denote by 2?(S) the Cartesian product of / copies of 'D[Ti); any element 
{W[,..., Wl) G V{J:y will be written as W{--- W[. We define a function 

di : P(S)' X V{T,y — > R 

{W{---Wl,W{' ■■■}¥;') ^ mkxd{Wl,Wn- 

1=1 

It is easy to check that di is a metric on 2?(S) , which makes 2?(S) into a metric 
space. Observe that for any given Mw, the word language of Mw gives a function 
L^{Ml)\.jy^j^y : P(S)' — > R that maps W{---W[ to L^{mI){W{- ■ -Wl). Such a 
function has a good property, as shown below. 

Proposition 4.12. For any I > 1, the function Lw{Mw)\.jy/Y;y is uniformly contin- 
uous on D(S)'. In other words, for any e > 0, there exists a 6 > such that 
\L^{Ml){Wi ■ ■ ■ Wl) - Lu,iMl){Wi' ■ ■ ■ W/')| < e, whenever d{Wl, Wl') < 6 holds for 
every pair W^, W^' G ^(S), where 1 < i < I. 

Proof. What we actually prove first, by induction on I, is the following claim: for 
any e > 0, there exists a6 >0 such that \5^ {p, W[- ■ ■ W/)(g) - b^(j), Wl ■ ■ ■ W'^){q)\ < e 
for any p,q £ Q, whenever d(Wl,Wl') < 6 holds for every pair W[,W[' G ^(S), where 
l<i<l. 

1) The case of / = 1 follows immediately from Proposition 14.101 

2) Assume that the statement holds for the case oil — 1. We now consider the case of 
I. Given any e > 0, it follows from the induction assumption that for e/{2\Q\), there 
exists a 5' > such that \6^(j), W[--- VF/_ J(g) - 5\p, W'l ■ ■ ■ W'{_-^(ci)\ < e/{2\Q\) for 
any p,q € Q, whenever d(Wl,Wl') < 6' holds for every pair 1^/,W/' G '^(X'), where 
1 < i < l — l. It also follows from the basis step or Proposition 14.101 that for the given e, 
there exists a 5" > such that \S'^{p, Wl){q) - 6'^{p, Wl'){q)\ < €/{2\Q\) for any p,q £ Q, 
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whenever d(Ty/, W/') < 5". Taking 6 = mm{5',5"}, we get that 

\6Hp, w[--- wuwl){q) - 6Hp, wi' ■ ■ ■ wIL^wl'M] 

= \J2sHp,Wi■■■WUW)■6Hq',W|){q)-Y,S\p,W{'■■■W|'_^W)■6Hq',Wn{q)\ 

q'£Q q'eQ 

= I E [^Hp,Wi---WU){q')-6Hq',Wl){q)-6Hp,Wr--WlUW)-SHQ',Wn{q)]\ 

q'dQ 

< Y.\^^(P^Wi---WU)iq')-6Hq',Wl){q)-6Hp,W['---WlUW)-6Hq',Wl')iq)\ 
q'eQ 

= Yl \SHQ',W;)iq)-[5Hp,Wi---WU){q')-5Hp,Wi'---Wl'_,){q')] 
q'eQ 

+ 5Hp, Wi' ■ ■ ■ WlUW) ■ [6Hq', W^iq) - 5\q', 0(<z)] | 

9'eQ 

+ \6Hp,W['---Wl'_,W)-[SHq',Wl)iq)-5\q',Wniq)]\} 

< Y{\6\p,wi---wUW)-5Hp,wi'---wlL,)iq')\ 



q'&Q 

+ 



6Hq',W;){q)-6Hq',Wn{q)\} 



2\Q\ 2\Q\ 



namely, \6^p, W{--- Wl_^W[){q) - 6^p, W'( ■ ■ ■ W'{_^W'{){q)\ < e. This completes the 
proof of the claim. 

Now, let us use the claim and the definition of word languages to prove the propo- 
sition. For any e > 0, it follows from the claim that there exists a 6 > such 
that \SHqo,Wi---Wl){q) - S^qo,W{' ■ ■ ■Wl'){q)\ < e/\F\ for any q e F, whenever 
d{Wl, W/') < 5 holds for every pair W/, W" G ^(S), where 1 <i <l. By the definition 
of word languages, we have that 

\L^iMl){Wi ■ ■ ■ Wl) - L^iMl){Wi' ■ ■ ■ Wn\ 
= I E ^^^(90, wi--- W!){q) - Y, S\qo, Wi' - - - wniq)\ 

qeF q&F 

= I Y [^^(10, wi--- Wl){q) - 6Hqo, Wi' - - - Wniq)] \ 

qeF 

< El'^^(^o,H^r--vf^/)('z)-5^fe,<-- -0(9)1 

qeF 

which finishes the proof of the proposition. D 

Remark 4.13. When the words are interpreted as possibility distributions, one can 
readily show that Proposition 14. lUl and Proposition l4.12l remain valid by replacing I'(S) 
with J^(S). 
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5 Retractions and generalized extensions of PGCWs 

In Sections 3 and 4, we introduced the concepts of retractions and generalized ex- 
tensions of PACWs. In fact, these concepts are appropriate for PGCWs. We briefly 
discuss them in this section. 

Motivated by the retractions and the generalized extensions of PACWs, we can 
directly define the retractions and the generalized extensions of PGCWs as follows. 

Definition 5.1. Let G^ = {V,T,w,Pii,,S) be a PGCW. The retraction of G^ is a prob- 
abilistic grammar Gw = {V, T,, P, S), where S is the underlying terminal set of S„, and 

P = {A^ aB :3W GJ^y, such that A -^ WB e P^ and W{a) / 0} U {^ ^ e : A G 1/} 

with 

W{a) 



Pr^{A^e) = Pr{A^e). 
Analogous, we have the following definition. 

Definition 5.2. The generalized extension of a PGCW Gw = (V, S^, P^, 5) is a proba- 
bilistic grammar for computing with all words (PGCAW), denoted Gw = {V, ^(S), P, S), 
where 2?(S) consists of all probabilistic distributions over the underlying terminal set of 
Su, and 



P = {A^ W'B ■.A,BeV, We P(S)} U{A^e:AeV} 



with 



Pr^A^W'B)= Y^ ew'{W)-Pr{A^WB), 
Pr^A^e) = Pr{A^e). 

In the above, 9w'{W) = J2aeT.^'(^) ' [^i^)/YlueT. ^('^)] ' which is the same as in 
Definition 14.11 

In terms of PGCWs, there is a retraction principle from computing with words to 
computing with values. 

Theorem 5.3. Suppose that G^ = (V, S^, P^, S) is a PGCW and gI = {V, S, P, S) is 
the retraction of Gw Then for any s = oi • • • a; G S*, we have that 

I 
L{Gi){s) = Yl Lw{Gw)(.Wi ---Wi) ■llxaAW^), 
Wi,...,Wies^ i=i 

where Xa,{W^) = Wi{a^)/J2ue^J^((^i)■ 

The generalized extension principle for PGCWs can be stated as follows. 
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Theorem 5.4. Suppose that G^ = {V, S^, P^, S) is a PGCW and Ci = {V, P(5]), P, S) 
is the generalized extension of Gw Then for any s = W[ ■ ■ ■ W^ G !?($])*, we have that 



I 



Wi,...,WieT.^ i=l 

where %,(VF,) = EaeE W^'(«) " [W^{a)/ Eue^J^(^)] ■ 

The proofs of Theorems 15.31 and 15 .41 follow immediately from Theorem 13 .31 Theorem 
14.61 and the results obtained in the next section, so we leave them to the end of Section 
6. 



6 Retractions and generalized extensions are equivalence- 
preserving 

As mentioned in Introduction, there are three kinds of equivalences among PACVs 
and probabilistic grammars: the equivalence between two PACVs, the equivalence be- 
tween two probabilistic grammars, and the equivalence between PACVs and probabilistic 
grammars; see Figure El In this section, we examine the preservation of these equiva- 
lences under retractions and generalized extensions. 

Let us start with the following definition. 

Definition 6.1. Two PACVs M' = {Q',J:,6' ,q'o,F') and M" = {Q" ,T,,6" ,q'^, F") are 
equivalent if L{M'){s) = L{M"){s) for all strings s G S*; two probabilistic grammars 
G' = (y',S,P',y) and G" = (F",S,P",5") are equivalent \i L{G'){s) = L{G"){s) for 
all strings s G S*. 

Recall that the PACV M' is said to be equivalent to the probabilistic grammar G', 
if L{M'){s) = L{G'){s) for all strings s G S*. Clearly, these definitions are applicable to 
PACWs and PACAWs (and also PGCWs and PGCAWs) by replacing S with S^ and 
P(S), respectively. The next proposition shows us that if two PACWs are equivalent, 
then so are their retractions and generalized extensions; this corresponds to the two left 
boxes in Figure |21 

Proposition 6.2. Suppose that M'^ = {Q' ,T.^,5' ,q'Q,F') and M^ = {Q",T.^,6" ,q'(^,F") 
are two equivalent PACWs. Then: 

(a) Mi} = {Q',^,6'^,q'o,F') and M^^ = (Q", S, 5"^ g^',P") are equivalent. 

(b) mJ = (Q^2?(S),(5'^g^,P') and M^^ = {Q" ,V{Y.),5"\q'^,F") are equivalent. 

Proof. The proofs of (a) and (6) are similar, so we only prove the assertion (a). The 
hypothesis means that L^{M'J{Wi ■■■Wi) = L^{M'l){Wi ■■■Wi) for all Wi---Wi G 
S^. To show that M^ and M^, are equivalent, by Definition 16. II we need to verify that 
L{MI}){s) = L{Mw'){s) for any s = di • • • a/ G S*. It follows from Theorem HUH and the 
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generalized 
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G" 
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Figure 3: Equivalences under retractions and generalized extensions. 



hypothesis that 

I 
L{M:}){s) = Yl LUM:,){Wl■■■Wl)■llx.AW^) 

Wi,...,WieT,^ i=i 

I 

J2 L^iM::,){w,■■■Wl)■l{xaAw^) 

Wi,...,Wi£T,^ i=l 

as desired. This finishes the proof of the proposition. D 

We continue to discuss the preservation of equivalence between PACWs and PGCWs, 
which corresponds to the two in-between boxes in Figure |31 

Proposition 6.3. Let M^ = {Q',^^,6' ,q'Q,F') be a PACW and G'^ = {V',^^,P^, S') 
be a PGCW. If M'^ is equivalent to G'^, then: 

(a) Mi} = {Q\^,6'^,q'Q,F') and G^ = {V',^,P\S') are equivalent. 

(b) M^ = {Q',V{^),6'^,q'Q,F') and G'J, = {V',V{^), P', S') are equivalent. 

Proof. We only prove (a); the part (6) can be proved in a similar way. Using the 
construction in Section 2.3, we have the PACW Mi = {V',T,w,6i, S' ,Fi) induced from 
Gl, where Fi = {A' G V : Pr'{A' ^ e) = 1} and 6i is defined by 6i{A',W){B') = 
Pr'{A' -^ WB') for any A', B' £ V' and W G S^. Therefore, M^ and Mi are equivalent. 
We thus get by Proposition 16 . 21 that Mj and MJ^ are equivalent. Consequently, to prove 
(a), it is sufficient to show that Mf and G^ are equivalent. We verify it by showing that 
M| is equivalent to the PACV, say M2, induced from Gw. 



By definition, m/ = {V',^,6i,S' , Fi), where 

W{a) 



Si{A',a){B') 



E 



E{/es„ U{a) 



6i{A',W){B') 



Wia) 



■Pr'{A' -^ WB') 
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for any A' , B' G V' and a G S. By the definition of retractions, we see that in Gw, 

P' = {A' -^ aB' : 3VF E S^ such that A' -^ WB' G P[^ and W{a) / 0} 
[J{A' -^e:A'£V'} 



with 



Pr'^{A' ^aB')= J^ ^^"^ Pr'{A' ^WB') 



WeT,^ '^UeT.u 



Pr'^A' ^e) =Pr'{A' 



e 



Again, using the construction in Section 2.3, we have the PACV M2 = iy\ S, ^2, S", -F2 



induced from Gw, where 



{A' G V' : Pr'^{A' ^ e) = 1} 
{A' e V : Pr'{A' ^ e) = 1} 



and 82 is defined by 



52{A',a){B') = Pr'^A'^aB') 

for any A', B' £ V and a G S. Whence, F2 = Fi and 62 = 6{. So we get that m| = M2, 
finishing the proof of the part (a). D 

Based on the above result, it is easy to show that if two PGCWs are equivalent, 
then so are their retractions and generalized extensions; this corresponds to the two 
right boxes in Figure 01 

Proposition 6.4. Suppose that G'^ = {V ,T.^,P' ,S') and G'^ = (V" ,^n„PZ,S") are 
two equivalent PGCWs. Then: 

(a) G^ = {V',^,P',S') and G'i^ = {V",^,P",S") are equivalent. 

(b) G'^ = iV',V{J:),P',S') and G'^^ = {V",V{J:),P" ,S") are equivalent. 

Proof. We only prove (a). The proof of (6) goes along the same lines as that of (a), 
so we omit it. Let M^ be the PACW induced from G'^. It follows that M^ and G'^^ are 
equivalent, and also M^ and G'^ are equivalent. By Proposition 16. 3| we see that Mw 
and Gw are equivalent, and also M^ and Gyj are equivalent. Therefore, Gw and G^ 
are equivalent, as desired. D 

Remark 6.5. When the words are interpreted as possibility distributions, it is not hard 
to show that the results given in this section remain true by replacing ^(S) with J-{Ti). 

We end this section with the proofs of Theorems 15.31 and 15.41 

Proof of Theorem Ol Let M^ be the PACW induced from G^. Then G^ 
and Mm are equivalent. By Proposition 16.31 we see that Gw and Mi, are equivalent. 
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Consequently, it follows from Theorem 13.31 that for any s = ai • • • a; G S*, 

L{Gl){s) = L{Ml){s) 

I 

Y^ L^{M^){Wl■■■Wl)■l[xaAW^) 

Wi,...,W,£S^ i=l 

I 

Y, L^{G^){Wi---Wi)-l[xaAWi). 

Wi,...,WieT:^ j=i 

This completes the proof. D 

Theorem 15.41 can be easily proved by using Theorem 14.61 and following along the 
same lines as that of Theorem 15.31 

7 Relationships among retractions, extensions, and gener- 
alized extensions 

So far, we have seen three kinds of transformations among PACVs, PACWs, and 
PACAWs (correspondingly, three kinds of transformations among probabilistic gram- 
mars, PGCWs, and PGCAWs), that is, the extensions in Definition 12.21 the retractions, 
and the generalized extensions. In fact, they are related, and we now provide some 
relationships in this section. 

We first show that the extension given by Definition 12.21 is a special case of the 
generalized extension introduced in Section 4.1. To see this, we only need to regard a 
PACV as a PACW by identifying an input a with the Dirac distribution a for a. More 
explicitly, for a given PACV M^ = (Q, S, 5^, go; -^)> we identify M^ with a special PACW 
Mw = {Q,T,w,6w,qo,F), where two different components are 

'^w = {(5" : (T is the Dirac distribution for u € S} and 
(5^(p,(t) = 6y{p,a) for any {p,a) G Q x S^. 



It follows from (1) in Definition 13. II that 



f(^) 



= -ri—:-Oy,{p,a) 
a{a) 

= 5w{p,a) 
= 5y{p,a). 



ri _ I'/O V xi 



Hence, Mi = {Q,T,,6ii),qo,F) = My. In other words, the identification of M^ and M^ 
leads to that the retraction of M^ is identical with itself. 

The next result shows that the concept of generalized extensions is a generalization 
of the extensions used in El • 



Proposition 7.1. Let M^ = {Q,Ti,5v,qo,F) be a PACV. Then the extension M 
{Q, 'D{Ti),6y, qo, F) given by Definition \2.^ is the same as the generalized extension Mw 
iQ,V{^),6l,qo,F). 
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Proof. It is sufficient to show that 4(p, W) = 6lip, W) for ah {p, W) £ Q x P(S). 
By Definition 12.21 we see that Sy{p, W) = X^^-gs VF'(cr) • 6y{p, a). On the other hand, it 
follows from Definition 14. II that 

= E [Y.^'{<r).f{a)]-6Up,f) 

= E ^'(^)-'^-(^'^) 

Hence, 5„(p, VF') = 5l]{p,W'), as desired. D 

Based on Proposition 17.11 we view the extension in Definition 12.21 as a generalized 
extension hereafter. As we see from Figure ^ there are two approaches from computing 
with words to computing with all words: One is the generalized extension (6); the other 
is the composition of processes (a) and (c). The next proposition shows that both 
approaches yield the same result for the model of probabilistic automata. 

Proposition 7.2. Let M^ = {Q,^^„6,qo,F) be a PACW. Then {mIY = mI. 

Proof. By definition, mI = {Q,V{J:),6\qo,F) with 6^p,W') = Evkgs„ Ow'{W) ■ 
d{p, W) for any {p, W) E QxP(S). In contrast, mI = (Q, S, 6^,qo, F), where 6^{p, a) = 
YIwg'E XaiW) -Si^p, W) for any (p, cr) G Q x S. Consequently, the generalized extension 
of Mi is (m4)T = (Q,P(S), {6^y,qo,F). By Proposition O and Definition EU we see 
that 

i6^)\p,W') = ^W'{a)-5^{p,a) 

ctGS 

= ^W'{a)-[ E xAW)-5{p,W)] 
ctGS vkgs» 



= E E W'{a)-xAW)-S{p,W) 

= E E^'(^)-^-w-'^(^'^) 

VFgSu,ctGS 

= E [J2^'i^)-xAw)]-6{p,w) 

= E Ow'iW) ■ 6{p,W) 

= 6Hp,W') 

for any {p,W') £ Q x V{T). That is, {5^^ = 5\ and thus {M^Y = mI, finishing the 
proof, n 

Remark 7.3. Observe that the propositions obtained in this section evidently hold 
when the words are interpreted as possibility distributions. By using the properties of 
equivalences developed in Section 6, one can derive the results analogous to Propositions 
17. II and 17.21 for the model of probabilistic grammars. 
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8 Conclusion 

In this paper, we have introduced two equivalent probabilistic models of computing 
with words via probabilistic automata and probabilistic grammars. The work has been 
developed based on the probabilistic distribution interpretation of words; some neces- 
sary modifications have been remarked in order that the results are applicable to the 
possibility distribution interpretation of words. The probabilistic models here can eas- 
ily incorporate expert knowledge described by propositions in a natural language into 
system specification. Taking the finite inputs of the models into account, we established 
their retractions and generalized extensions, which have made it possible to compute 
any words. Furthermore, we obtained a retraction principle and a generalized extension 
principle showing that computing with values and computing with all words can be 
respectively implemented by computing with some special words. Moreover, the retrac- 
tions and the generalized extensions proved to be equivalence-preserving. In addition, 
we have examined some analytical properties of transition probability functions of the 
generalized extensions, which are helpful in comparing the transition probabilities of 
two near inputs. Some relationships among the retractions, the generalized extensions, 
and the extensions studied recently in ^2] have also been provided. 

There are some limits and directions in which the present work can be extended. As 
mentioned earlier, the generalized extension of a probabilistic model is actually a process 
of interpolation. Thus, a basic problem is how to choose words and how to rationally 
specify their behavior. In turn, one can use many other interpolation approaches to 
cope with the problem of accepting any words as inputs. As a continuation of [2j, this 
work further indicates that building a model for computing with some special words and 
then extending the model for computing with all words are of universality. Therefore, it 
is feasible to apply this method to other computational models such as fuzzy grammars 
jS], other probabilistic automata ^\, and fuzzy and probabilistic neural networks (see, 
for example, J17|[T]). A topic of ongoing work concerns the formal model of computing 
with words of many kinds. 
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